Skip to main content

Destination Setup

Data pipeline setup in DataStori is a three-step process:

  • Configure Ingestion
  • Scheduling and Data Load
  • Destination Setup

This article cover the final step of setting up the destination.

DataStori writes data to the blob (Azure, AWS, GCP) and to the user-defined database location.

Example: To fetch the General Ledger from NetSuite (API) every day, here is the destination configuration:

Blob Setup

  • Cloud Provider - Select one of Azure, AWS or GCP.

  • Root Storage Location - The root folder or bucket where the data is to be kept. In the above case, we have a separate location for NetSuite, which is the source level folder.

  • Pipeline data root folder - This is a dedicated space for the General Ledger. We may fetch other data from NetSuite, and this level is the data-specific folder.

  • Pipeline Execution Data folder - This space has execution-specific data. We have named it {{query_date}}, which is a dynamic variable. A new folder gets created for each new {{query_date}}.

The blob output view corresponding to the above definition is as follows:

The other folders in the blob are:

  • deduped - contains deduped data. DO NOT delete this folder as it is used for incremental loads.

  • final - contains flattened deduped data.

    • final/csv - contains flattened deduped data in CSV format.
    • final/csv/single_file.csv - contains the merged, non-partitioned CSV file in the CSV folder. This data is loaded into the SQL database. You can read it directly from here instead of the database, if you want to.
  • schema - contains the schema evolution information of the dataset, and track its evolution from the starting schema to the current one.

  • {root}/metadata - contains the temporary encrypted API credentials (OAuth2 access tokens) needed to make API calls.

Database Setup

DataStori uses SQL Alchemy to write data to the destination database. The following databases are supported: https://docs.sqlalchemy.org/en/20/dialects/index.html

If your chosen database is not present in this list, please write to contact@datastori.io to have it added to the list.

DataStori uses the final csv single file (in the blob setup) to truncate and load the database table, which serves as the user front-end for the data in the blob.

For details on data flattening and schema management, please see here.